46 research outputs found
Real-time event detection in massive streams
Grant award number EP/J020664/1New event detection, also known as first story detection (FSD), has become very
popular in recent years. The task consists of finding previously unseen events from
a stream of documents. Despite the apparent simplicity, FSD is very challenging and
has applications anywhere where timely access to fresh information is crucial: from
journalism to stock market trading, homeland security, or emergency response. With
the rise of user generated content and citizen journalism we have entered an era of big
and noisy data, yet traditional approaches for solving FSD are not designed to deal
with this new type of data.
The amount of information that is being generated today exceeds by many orders
of magnitude previously available datasets, making traditional approaches obsolete
for modern event detection. In this thesis, we propose a modern approach to event
detection that scales to unbounded streams of text, without sacrificing accuracy. This
is a crucial property that enables us to detect events from large streams like Twitter,
which none of the previous approaches were able to do.
One of the major problems in detecting new events is vocabulary mismatch, also
known as lexical variation. This problem is characterized by different authors using
different words to describe the same event, and it is inherent to human language. We
show how to mitigate this problem in FSD by using paraphrases. Our approach that
uses paraphrases achieves state-of-the-art results on the FSD task, while still maintaining
efficiency and being able to process unbounded streams.
Another important property of user generated content is the high level of noise,
and Twitter is no exception. This is another problem that traditional approaches were
not designed to deal with, and here we investigate different methods of reducing the
amount of noise. We show that by using information from Wikipedia, it is possible to
significantly reduce the amount of spurious events detected in Twitter, while maintaining
a very small latency in detection.
A question is often raised as to whether Twitter is at all useful, especially if one
has access to a high-quality stream such as the newswire, or if it should be considered
as sort of a poor man’s newswire. In our comparison of these two streams we find that
Twitter contains events not present in the newswire, and that it also breaks some events
sooner, showing that it is useful for event detection, even in the presence of newswire
Can twitter replace newswire for breaking news?
Twitter is often considered to be a useful source of real-time news, potentially replacing newswire for this purpose. But is this true? In this paper, we examine the extent to which news reporting in newswire and Twitter overlap and whether Twitter often reports news faster than traditional newswire providers. In particular, we analyse 77 days worth of tweet and newswire articles with respect to both manually identified major news events and larger volumes of automatically identified news events. Our results indicate that Twitter reports the same events as newswire providers, in addition to a long tail of minor events ignored by mainstream media. However, contrary to popular belief, neither stream leads the other when dealing with major news events, indicating that the value that Twitter can bring in a news setting comes predominantly from increased event coverage, not timeliness of reporting
Cut Length Distributions of Haylage Particles
Alfalfa is one of the most important crops for forage production. Traditional method of alfalfa conservation assumes hay preparation. However, nowadays it is also commonly processed in the form of silage and haylage. Physiological effects of forages that are included in diets depend on plant species, stage of maturity, method of preservation and diet composition. Physical characteristics of rations for ruminants are primarily influenced by dietary forage to concentrate ratio, type of forages and concentrates, and mean particle size of feeds. Length distribution of forage particles represents an important parameter for ruminant’s diet formulation, especially for dairy cattle. During silage production, harvest considerations should be focused to obtaining the adequate particle size distribution of the ensiling crop particles.
This paper presents results of testing three contemporary types of self-propelled silage harvesters applied in the alfalfa haylage preparation: Claas Jaguar 950, Krone Big X 700 and Krone Big X 500. All machines were adapted with pick-up headers. In the study are analyzed length distributions of chopped alfalfa particles. Resulting frequency distributions of produced haylage are characterised by high mass percentage of the fraction comprehending the largest particles. It is also evident that harvester Class Jaguar 950 achieved the mean chopping length closest to preset value
Combines Work Quality in Maize Silage Production
The paper presents testing results of three silage combines employed in maize silage preparation in Toplica region. It is focused on determination of technical working parameters of tested machines. Achieved results verified the superiority of silage combine John Deere 5820, which produced the chopped mass having particle lengths of the smallest deviation with respect to the preset cutting length. In this case, the average length of chopped mass was 9.9 mm, having 69 % mass in the range up to 8 mm. The other two silage combines produced lower mass percentage of this fraction and larger variations of particle lengths with respect to the preset length. Minimum mass flow rate was evidenced for the silage combine Fortschrit E-286: 7.3 kg s-1 (26.3 t h-1) and the surface productivity of 0.83 ha h-1, at the average speed of 4.0 km h-1. Maximum production rate was achieved with silage combine John Deere 5820: 10.9 kg s-1 (39.1 t h-1) at average working velocity of 4.7 km h-1 and surface efficiency of 1.21 ha h-1
Streaming first story detection with application to Twitter
With the recent rise in popularity and size of social media, there is a growing need for systems that can extract useful information from this amount of data. We address the problem of detecting new events from a stream of Twitter posts. To make event detection feasible on web-scale corpora, we present an algorithm based on locality-sensitive hashing which is able overcome the limitations of traditional approaches, while maintaining competitive results. In particular, a comparison with a stateof-the-art system on the first story detection task shows that we achieve over an order of magnitude speedup in processing time, while retaining comparable performance. Event detection experiments on a collection of 160 million Twitter posts show that celebrity deaths are the fastest spreading news on Twitter.
Comparison of Essential Metals in Different Pork Meat Cuts from the Serbian Market
AbstractPork consumption in Serbia accounts for a large share of total meat consumption. Pork is valuable sources of nutrients. We analyzed metal content in three different cuts of pork collected from the Serbian market during 2014. Analyses of the following isotopes: zinc (66Zn), copper (63Cu) and iron (57Fe) were performed by ICP-MS. Our data show that Zn, Cu and Fe were present in significantly different levels in hind leg, loin and shoulder, and that shoulder meat was richest in the analyzed metals. The differing mineral status of different pork cuts implies differences in their nutritional benefits for the human diet